Mako hSDM BRT explore (Background PAs)

Author

Emily Nazario

Published

June 28, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

base_Nspat_Ntag <- readRDS(brt_outputs[7])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.2861955
Correlation        0.9282161
AUC                0.9929000
Per.Expl          79.3550598
cvDeviance         0.5921329
cvCorrelation      0.8021263
cvAUC              0.9463200
cvPer.Expl        57.2860162
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ntag) 

             rel.inf
bathy_mean 37.846057
temp_mean  24.062148
sal_mean    7.155808
chl_mean    6.204518
ssh_mean    5.944164
uostr_mean  4.190075
vostr_mean  3.770208
bathy_sd    2.868287
mld_mean    2.643751
uo_mean     2.213286
vo_mean     1.880648
pred_var    1.221051
#explore partial plots
gbm.plot(base_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ntag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   854.94
2         10 bathy_mean          3   sal_mean   698.33
3         10 bathy_mean          8   ssh_mean   563.67
4          8   ssh_mean          2  temp_mean   551.11
5         10 bathy_mean          4    uo_mean   540.01
6          3   sal_mean          2  temp_mean   399.47
7          2  temp_mean          1   chl_mean   346.97
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ntag, dat_test_base,
                     n.trees = base_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.3752151
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7396679
max(e@TPR + e@TNR -1) #TSS
[1] 0.8720777
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ntag)
[1] 79.35506
#eval 75/25
eval_7525_modified(base_Nspat_Ntag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2282071 0.8916574 0.9800992 0.9987515         0.7293395 0.7935506
base_Nspat_Ytag <- readRDS(brt_outputs[8])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.1110165
Correlation        0.9807084
AUC                0.9998000
Per.Expl          91.9917378
cvDeviance         0.3495621
cvCorrelation      0.8963288
cvAUC              0.9783500
cvPer.Expl        74.7840586
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ytag) 

              rel.inf
bathy_mean 32.1041550
tag        24.5344527
temp_mean  18.3478756
uostr_mean  5.0132259
ssh_mean    4.4470038
sal_mean    4.3156858
chl_mean    3.5816288
vostr_mean  2.8189697
bathy_sd    1.2908042
vo_mean     1.2581968
uo_mean     0.9661739
mld_mean    0.8414705
pred_var    0.4803574
#explore partial plots
gbm.plot(base_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ytag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          4   sal_mean          1        tag  1956.77
2         11 bathy_mean          1        tag   827.44
3          3  temp_mean          1        tag   701.21
4          2   chl_mean          1        tag   665.02
5          9   ssh_mean          1        tag   579.04
6          7    vo_mean          1        tag   394.71
7         11 bathy_mean          3  temp_mean   305.91
8          6 uostr_mean          1        tag   288.92
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ytag, dat_test_base,
                     n.trees = base_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.1818552
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7593379
max(e@TPR + e@TNR -1) #TSS
[1] 0.9572719
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ytag)
[1] 91.99174
#eval 75/25
eval_7525_modified(base_Nspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1469908 0.956733 0.9939051 0.9935086         0.8688192 0.9199174
base_Yspat_Ytag <- readRDS(brt_outputs[9])

# model performance 
ggBRT::ggPerformance(base_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38627408
Residual.Deviance  0.09036597
Correlation        0.98482457
AUC                0.99990000
Per.Expl          93.48137767
cvDeviance         0.29733241
cvCorrelation      0.91408473
cvAUC              0.98300000
cvPer.Expl        78.55168620
#relative influence of predictors
ggBRT::ggInfluence(base_Yspat_Ytag) 

              rel.inf
dist_coast 52.3832103
tag        20.2478553
lat         9.1003879
temp_mean   4.2416317
bathy_mean  3.5147582
chl_mean    2.7750598
sal_mean    2.0626823
ssh_mean    1.2159874
vostr_mean  1.1867095
uo_mean     0.6788234
vo_mean     0.6673415
uostr_mean  0.5772978
bathy_sd    0.5066506
mld_mean    0.5003317
pred_var    0.3412726
#explore partial plots
gbm.plot(base_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Yspat_Ytag)
base_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   651.04
2           3   chl_mean          1        tag   595.00
3          12 bathy_mean          1        tag   544.74
4           5   sal_mean          1        tag   499.41
5          14 dist_coast          1        tag   381.15
6           9 vostr_mean          1        tag   319.69
7           8    vo_mean          1        tag   285.70
8           4  temp_mean          1        tag   249.85
9          10   ssh_mean          1        tag   193.53
10         13   bathy_sd          1        tag   163.65
11          6    uo_mean          1        tag   123.96
#predictive performance using test dataset 
preds <- predict.gbm(base_Yspat_Ytag, dat_test_base,
                     n.trees = base_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.1536541
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7709545
max(e@TPR + e@TNR -1) #TSS
[1] 0.965249
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Yspat_Ytag)
[1] 93.48138
#eval 75/25
eval_7525_modified(base_Yspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1339725 0.9641455 0.9954561 0.9899414          0.889162 0.9348138

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

do_0m_Nspat_Ytag <- readRDS(brt_outputs[14])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.08569242
Correlation        0.98673362
AUC                1.00000000
Per.Expl          93.81859271
cvDeviance         0.30444849
cvCorrelation      0.91329019
cvAUC              0.98270000
cvPer.Expl        78.03866260
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.3188924
o2_mean_0m 25.6922130
tag        20.8632526
temp_mean   4.9825010
ssh_mean    3.9449855
chl_mean    3.6738368
vostr_mean  2.1762902
uostr_mean  1.9424041
sal_mean    1.8962177
bathy_sd    0.9210810
mld_mean    0.8088337
uo_mean     0.7033574
vo_mean     0.6742398
pred_var    0.4018948
#explore partial plots
gbm.plot(do_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           5   sal_mean          1        tag  1317.79
2          12 bathy_mean          1        tag   898.27
3           2 o2_mean_0m          1        tag   753.94
4           4  temp_mean          2 o2_mean_0m   731.95
5          13   bathy_sd          1        tag   461.04
6           4  temp_mean          1        tag   452.00
7           3   chl_mean          1        tag   350.97
8          10   ssh_mean          1        tag   322.16
9           9 vostr_mean          1        tag   308.68
10          8    vo_mean          1        tag   280.17
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1311332
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7691736
max(e@TPR + e@TNR -1) #TSS
[1] 0.9755573
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ytag)
[1] 93.81859
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1200159 0.9715853 0.9970746  0.998006         0.9054067 0.9381859
do_0m_Yspat_Ytag <- readRDS(brt_outputs[15])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.07111301
Correlation        0.98956599
AUC                1.00000000
Per.Expl          94.87027583
cvDeviance         0.26513270
cvCorrelation      0.92577703
cvAUC              0.98583000
cvPer.Expl        80.87470024
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ytag) 

              rel.inf
dist_coast 52.2693314
tag        18.2789790
o2_mean_0m  9.9461704
lat         7.1842134
bathy_mean  3.0959517
chl_mean    1.9510753
sal_mean    1.4925425
temp_mean   1.2416253
vostr_mean  0.9383412
ssh_mean    0.8462597
vo_mean     0.5322002
bathy_sd    0.5206533
uo_mean     0.5128695
mld_mean    0.4554318
uostr_mean  0.4344147
pred_var    0.2999407
#explore partial plots
gbm.plot(do_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           3 o2_mean_0m          1        tag   681.29
2           2        lat          1        tag   614.15
3          15 dist_coast          1        tag   509.20
4          13 bathy_mean          1        tag   470.74
5           5  temp_mean          3 o2_mean_0m   461.63
6           4   chl_mean          1        tag   409.48
7           6   sal_mean          1        tag   393.97
8          14   bathy_sd          1        tag   350.26
9          10 vostr_mean          1        tag   252.37
10          9    vo_mean          1        tag   250.86
11         15 dist_coast          3 o2_mean_0m   214.24
12         11   ssh_mean          1        tag   202.54
13          5  temp_mean          1        tag   189.74
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1138218
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7808104
max(e@TPR + e@TNR -1) #TSS
[1] 0.977085
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ytag)
[1] 94.87028
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1124583 0.9749439 0.9977097 0.9977912         0.9178943 0.9487028
do_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[13])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06921534
Correlation        0.99039620
AUC                1.00000000
Per.Expl          95.00716346
cvDeviance         0.28881667
cvCorrelation      0.91957663
cvAUC              0.98372000
cvPer.Expl        79.16626123
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ytag) 

               rel.inf
bathy_mean  28.6020233
o2_mean_0m  25.7568149
tag         19.2681616
o2_mean_60m 10.5230432
ssh_mean     3.9366672
chl_mean     3.0096521
temp_mean    1.8833026
sal_mean     1.7203767
uostr_mean   1.2905618
vostr_mean   1.2578356
mld_mean     0.6504126
bathy_sd     0.6501386
vo_mean      0.6103547
uo_mean      0.5075413
pred_var     0.3331139
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ytag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1          12  bathy_mean          1        tag   827.96
2           5    sal_mean          1        tag   765.02
3           2  o2_mean_0m          1        tag   726.73
4          14 o2_mean_60m          1        tag   465.01
5           4   temp_mean          2 o2_mean_0m   448.02
6           3    chl_mean          1        tag   401.11
7           4   temp_mean          1        tag   368.01
8           9  vostr_mean          1        tag   330.29
9          13    bathy_sd          1        tag   285.54
10         10    ssh_mean          1        tag   258.37
11          8     vo_mean          1        tag   217.71
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1180656
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7723101
max(e@TPR + e@TNR -1) #TSS
[1] 0.9775898
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ytag)
[1] 95.00716
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1131281 0.9746304 0.9972402 0.9972539         0.9148331 0.9500716
do_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[10])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862930
Residual.Deviance  0.0772215
Correlation        0.9880214
AUC                1.0000000
Per.Expl          94.4296408
cvDeviance         0.2923215
cvCorrelation      0.9180102
cvAUC              0.9832900
cvPer.Expl        78.9134402
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ytag) 

                rel.inf
o2_mean_0m   26.7124835
bathy_mean   24.5325773
tag          19.6354077
o2_mean_250m 16.3256290
ssh_mean      2.3877300
chl_mean      2.0737228
temp_mean     1.9080636
sal_mean      1.5922732
vostr_mean    1.0597981
bathy_sd      0.9220362
uostr_mean    0.8579829
vo_mean       0.6550567
uo_mean       0.5147781
mld_mean      0.5111500
pred_var      0.3113108
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           5     sal_mean          1        tag  1083.83
2           2   o2_mean_0m          1        tag   810.86
3           4    temp_mean          2 o2_mean_0m   611.00
4          14 o2_mean_250m          1        tag   533.46
5          12   bathy_mean          1        tag   465.44
6           4    temp_mean          1        tag   407.57
7           3     chl_mean          1        tag   361.70
8          13     bathy_sd          1        tag   323.33
9          10     ssh_mean          1        tag   307.93
10          9   vostr_mean          1        tag   257.21
11         14 o2_mean_250m          2 o2_mean_0m   241.29
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1232636
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7771163
max(e@TPR + e@TNR -1) #TSS
[1] 0.9760588
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ytag)
[1] 94.42964
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1166426 0.9730488 0.9972335 0.9974503         0.9110834 0.9442964
do_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06405767
Correlation        0.99133355
AUC                1.00000000
Per.Expl          95.37921112
cvDeviance         0.27651361
cvCorrelation      0.92323053
cvAUC              0.98486000
cvPer.Expl        80.05373977
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ytag) 

                rel.inf
o2_mean_0m   26.3744059
bathy_mean   24.2567146
tag          18.4164082
o2_mean_250m 12.9309717
o2_mean_60m   5.8644578
ssh_mean      3.4303989
chl_mean      1.8947511
temp_mean     1.3576031
sal_mean      1.2825109
uostr_mean    0.8488525
vostr_mean    0.8461226
bathy_sd      0.6934417
vo_mean       0.5279284
mld_mean      0.5255140
uo_mean       0.4617161
pred_var      0.2882026
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           2   o2_mean_0m          1        tag   756.09
2           5     sal_mean          1        tag   744.60
3           4    temp_mean          2 o2_mean_0m   681.68
4          12   bathy_mean          1        tag   457.99
5          15 o2_mean_250m          1        tag   423.21
6          14  o2_mean_60m          1        tag   351.76
7           3     chl_mean          1        tag   331.31
8           4    temp_mean          1        tag   322.01
9          13     bathy_sd          1        tag   241.45
10          9   vostr_mean          1        tag   237.92
11          8      vo_mean          1        tag   230.84
12         10     ssh_mean          1        tag   227.37
13          7   uostr_mean          1        tag   207.53
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1131788
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7783925
max(e@TPR + e@TNR -1) #TSS
[1] 0.9786226
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ytag)
[1] 95.37921
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1111881 0.9754482 0.9973543 0.9967165         0.9183581 0.9537921
do_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[12])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06107201
Correlation        0.99179288
AUC                1.00000000
Per.Expl          95.59458127
cvDeviance         0.26289954
cvCorrelation      0.92771979
cvAUC              0.98575000
cvPer.Expl        81.03578841
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ytag) 

                rel.inf
dist_coast   50.7776327
tag          17.4708084
o2_mean_0m   10.2952610
o2_mean_250m  4.9246556
lat           4.4525242
o2_mean_60m   2.8631545
chl_mean      1.8981111
bathy_mean    1.7092848
sal_mean      1.0541439
temp_mean     0.9245410
vostr_mean    0.6430521
ssh_mean      0.6087970
vo_mean       0.4919662
uostr_mean    0.4554889
bathy_sd      0.4404908
mld_mean      0.4397390
uo_mean       0.3089437
pred_var      0.2414052
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           3   o2_mean_0m          1        tag   639.99
2           6     sal_mean          1        tag   381.15
3          14     bathy_sd          1        tag   369.34
4          15   dist_coast          1        tag   365.12
5           2          lat          1        tag   346.60
6          17 o2_mean_250m          1        tag   306.59
7           4     chl_mean          1        tag   290.18
8          13   bathy_mean          1        tag   285.08
9          16  o2_mean_60m          5  temp_mean   253.71
10         16  o2_mean_60m          1        tag   244.79
11          5    temp_mean          3 o2_mean_0m   222.92
12         10   vostr_mean          1        tag   215.04
13          9      vo_mean          1        tag   167.52
14          8   uostr_mean          1        tag   116.74
15         15   dist_coast          3 o2_mean_0m   109.89
16         11     ssh_mean          1        tag   109.82
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1026398
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7833562
max(e@TPR + e@TNR -1) #TSS
[1] 0.978366
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ytag)
[1] 95.59458
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1067755 0.9773981 0.9982434 0.9959618         0.9259605 0.9559458

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

agi_0m_Nspat_Ytag <- readRDS(brt_outputs[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.09058218
Correlation        0.98574976
AUC                1.00000000
Per.Expl          93.46585281
cvDeviance         0.31679477
cvCorrelation      0.91027275
cvAUC              0.98076000
cvPer.Expl        77.14800254
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.7564227
tag        22.4317739
temp_mean  19.1472195
uostr_mean  4.6441372
ssh_mean    4.6035826
AGI_0m      4.3344168
sal_mean    3.7523624
chl_mean    2.9185772
vostr_mean  2.4429168
bathy_sd    1.2895177
vo_mean     0.8922124
mld_mean    0.7121506
uo_mean     0.6928352
pred_var    0.3818752
#explore partial plots
gbm.plot(agi_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1670.77
2           4   sal_mean          1        tag  1358.50
3           3  temp_mean          1        tag   796.37
4          11 bathy_mean          1        tag   716.67
5          12   bathy_sd          1        tag   500.56
6           8 vostr_mean          1        tag   360.82
7           2   chl_mean          1        tag   350.27
8          11 bathy_mean          3  temp_mean   343.69
9           9   ssh_mean          1        tag   293.85
10          7    vo_mean          1        tag   283.73
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1426735
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7635415
max(e@TPR + e@TNR -1) #TSS
[1] 0.9696389
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ytag)
[1] 93.46585
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1269187 0.9680701 0.9963482 0.9991957         0.8970815 0.9346585
agi_0m_Yspat_Ytag <- readRDS(brt_outputs[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07536563
Correlation        0.98861806
AUC                1.00000000
Per.Expl          94.56349886
cvDeviance         0.28457545
cvCorrelation      0.92193385
cvAUC              0.98308000
cvPer.Expl        79.47214380
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ytag) 

              rel.inf
dist_coast 52.6885132
tag        19.0100569
lat         9.3755080
temp_mean   3.8940828
bathy_mean  3.1187408
AGI_0m      2.5281708
chl_mean    2.1679033
sal_mean    2.0794485
ssh_mean    1.1326443
vostr_mean  0.8369160
uostr_mean  0.6860411
bathy_sd    0.6046589
vo_mean     0.5672036
uo_mean     0.5434052
mld_mean    0.4635287
pred_var    0.3031779
#explore partial plots
gbm.plot(agi_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   618.71
2           5   sal_mean          1        tag   458.94
3           3   chl_mean          1        tag   457.19
4          12 bathy_mean          1        tag   444.88
5          15 dist_coast          1        tag   353.77
6           4  temp_mean          1        tag   314.80
7          13   bathy_sd          1        tag   307.47
8           8    vo_mean          1        tag   282.40
9           9 vostr_mean          1        tag   277.46
10         14     AGI_0m          4  temp_mean   224.72
11         14     AGI_0m          1        tag   223.58
12         10   ssh_mean          1        tag   155.83
13          7 uostr_mean          1        tag   125.51
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1235775
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7758431
max(e@TPR + e@TNR -1) #TSS
[1] 0.9758245
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ytag)
[1] 94.5635
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained PseudoR2
1 0.1172909 0.9726691 0.996907 0.9962742         0.9108565 0.945635
agi_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.08163925
Correlation        0.98784986
AUC                1.00000000
Per.Expl          94.11095093
cvDeviance         0.30657340
cvCorrelation      0.91435083
cvAUC              0.98164000
cvPer.Expl        77.88532168
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.0165218
tag        22.7089581
temp_mean  19.3123505
uostr_mean  4.0921717
AGI_0m      4.0228983
AGI_60m     3.9028901
sal_mean    3.7019688
ssh_mean    3.0607407
vostr_mean  2.4285168
chl_mean    2.2359268
bathy_sd    1.0013958
vo_mean     0.8073497
uo_mean     0.7416074
mld_mean    0.6546704
pred_var    0.3120329
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1546.61
2           4   sal_mean          1        tag  1160.06
3           3  temp_mean          1        tag   718.56
4          11 bathy_mean          1        tag   678.18
5          14    AGI_60m          1        tag   562.58
6          11 bathy_mean          3  temp_mean   375.86
7           2   chl_mean          1        tag   371.78
8           8 vostr_mean          1        tag   332.14
9          12   bathy_sd          1        tag   314.26
10          7    vo_mean          1        tag   270.59
11          9   ssh_mean          1        tag   248.80
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1313386
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7637301
max(e@TPR + e@TNR -1) #TSS
[1] 0.9707055
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ytag)
[1] 94.11095
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1207813 0.9711081 0.9967905   0.99765          0.905258 0.9411095
agi_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.08329738
Correlation        0.98684251
AUC                1.00000000
Per.Expl          93.99134165
cvDeviance         0.29978280
cvCorrelation      0.91514338
cvAUC              0.98237000
cvPer.Expl        78.37516119
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ytag) 

              rel.inf
bathy_mean 25.7556370
tag        21.5992835
temp_mean  17.8343206
AGI_250m   12.5454181
uostr_mean  4.7132797
ssh_mean    4.0214353
AGI_0m      3.6118354
sal_mean    3.1006797
chl_mean    1.9387158
vostr_mean  1.4830631
bathy_sd    1.1646793
vo_mean     0.7268532
mld_mean    0.5851221
uo_mean     0.5700321
pred_var    0.3496450
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag  1107.60
2          13     AGI_0m          3  temp_mean   959.81
3           3  temp_mean          1        tag   653.96
4          14   AGI_250m          1        tag   483.70
5           2   chl_mean          1        tag   436.22
6          12   bathy_sd          1        tag   411.43
7          11 bathy_mean          1        tag   378.45
8           7    vo_mean          1        tag   342.67
9           8 vostr_mean          1        tag   323.33
10          9   ssh_mean          1        tag   260.62
11          6 uostr_mean          1        tag   213.62
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1374469
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7693346
max(e@TPR + e@TNR -1) #TSS
[1] 0.9704405
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ytag)
[1] 93.99134
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.123622 0.9696011 0.9959497 0.9972633         0.9008517 0.9399134
agi_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07572957
Correlation        0.98877144
AUC                1.00000000
Per.Expl          94.53724618
cvDeviance         0.29301289
cvCorrelation      0.91783184
cvAUC              0.98281000
cvPer.Expl        78.86350878
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ytag) 

              rel.inf
bathy_mean 26.8363780
tag        21.0957906
temp_mean  18.0262218
AGI_250m   12.1947644
uostr_mean  4.4840074
AGI_0m      3.8063516
ssh_mean    3.0616953
sal_mean    2.7962168
chl_mean    1.7130823
AGI_60m     1.4809786
vostr_mean  1.2200984
bathy_sd    1.0170479
vo_mean     0.8630534
uo_mean     0.5773383
mld_mean    0.5453203
pred_var    0.2816549
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1152.68
2           4   sal_mean          1        tag  1005.00
3           3  temp_mean          1        tag   635.71
4          15   AGI_250m          1        tag   397.25
5           2   chl_mean          1        tag   396.66
6          14    AGI_60m          1        tag   391.13
7           7    vo_mean          1        tag   371.09
8          11 bathy_mean          1        tag   357.41
9          12   bathy_sd          1        tag   320.70
10          8 vostr_mean          1        tag   299.57
11          9   ssh_mean          1        tag   232.02
12         15   AGI_250m          3  temp_mean   161.28
13          6 uostr_mean          1        tag   142.65
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1286697
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.768278
max(e@TPR + e@TNR -1) #TSS
[1] 0.9727492
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ytag)
[1] 94.53725
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
     RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.11824 0.9722116 0.9961794 0.9980251         0.9071832 0.9453725
agi_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07231845
Correlation        0.98904888
AUC                1.00000000
Per.Expl          94.78330693
cvDeviance         0.27235546
cvCorrelation      0.92438633
cvAUC              0.98474000
cvPer.Expl        80.35363316
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ytag) 

              rel.inf
dist_coast 52.2028237
tag        19.8286802
lat         8.5259754
temp_mean   3.6800178
AGI_250m    3.2802055
AGI_0m      2.2035478
bathy_mean  1.8557018
chl_mean    1.7402778
sal_mean    1.4654785
AGI_60m     1.1294626
ssh_mean    0.7267973
vostr_mean  0.7162280
uostr_mean  0.5488083
bathy_sd    0.5294415
vo_mean     0.5050275
mld_mean    0.4284863
uo_mean     0.3811785
pred_var    0.2518614
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   455.96
2          16    AGI_60m          1        tag   450.35
3           3   chl_mean          1        tag   425.15
4          14     AGI_0m          4  temp_mean   409.72
5          13   bathy_sd          1        tag   384.02
6           5   sal_mean          1        tag   331.25
7          12 bathy_mean          1        tag   330.47
8          17   AGI_250m          1        tag   285.00
9           9 vostr_mean          1        tag   237.90
10          4  temp_mean          1        tag   237.18
11          8    vo_mean          1        tag   234.93
12         14     AGI_0m          1        tag   169.22
13         15 dist_coast          1        tag   157.56
14          7 uostr_mean          1        tag   127.32
15         16    AGI_60m          3   chl_mean    99.12
16         14     AGI_0m          2        lat    88.90
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1221806
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7732709
max(e@TPR + e@TNR -1) #TSS
[1] 0.9742956
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ytag)
[1] 94.78331
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.115729 0.9733434 0.9965503 0.9956153         0.9118642 0.9478331

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 58.62 0.634 0.723 0.7550 0.948 0.3096 0.793 0.948 0.9923 0.586
base_0m_Nspat_Ytag 83.37 0.286 0.745 0.9199 0.992 0.1920 0.926 0.992 0.9940 0.834
base_0m_Yspat_Ytag 87.20 0.232 0.747 0.9360 0.995 0.1710 0.942 0.995 0.9960 0.872
do_0m_Nspat_Ytag 75.41 0.393 0.740 0.8480 0.981 0.2400 0.881 0.981 0.9980 0.754
do_0m_Yspat_Ytag 77.14 0.366 0.741 0.8580 0.984 0.2300 0.890 0.984 0.9970 0.771
do_0m_60m_Nspat_Ytag 76.37 0.382 0.749 0.8550 0.982 0.2360 0.885 0.982 0.9990 0.764
do_0m_250m_Nspat_Ytag 77.05 0.374 0.741 0.8570 0.983 0.2340 0.886 0.983 0.9980 0.770
do_0m_60m_250m_Nspat_Ytag 76.97 0.376 0.741 0.8530 0.882 0.2350 0.885 0.982 0.9990 0.770
do_0m_60m_250m_Yspat_Ytag 78.28 0.356 0.742 0.8650 0.985 0.2270 0.893 0.985 0.9980 0.783
agi_0m_Nspat_Ytag 76.24 0.377 0.741 0.8660 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_Yspat_Ytag 77.78 0.351 0.743 0.8780 0.986 0.2230 0.898 0.986 1.0020 0.778
agi_0m_60m_Nspat_Ytag 76.21 0.377 0.741 0.8620 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_250m_Nspat_Ytag 76.55 0.371 0.742 0.8650 0.984 0.2300 0.891 0.984 1.0000 0.765
agi_0m_60m_250m_Nspat_Ytag 76.96 0.366 0.742 0.8650 0.985 0.2280 0.893 0.985 1.0000 0.770
agi_0m_60m_250m_Yspat_Ytag 78.38 0.345 0.743 0.8790 0.987 0.2210 0.900 0.987 1.0000 0.784

DO models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

do_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[12])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2188393
Correlation        0.9490943
AUC                0.9966000
Per.Expl          84.2140724
cvDeviance         0.5170811
cvCorrelation      0.8339445
cvAUC              0.9588700
cvPer.Expl        62.7004570
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 36.618282
o2_mean_0m 29.909294
temp_mean   7.873891
chl_mean    5.305086
ssh_mean    4.223122
sal_mean    3.004185
vostr_mean  2.838490
mld_mean    2.084183
bathy_sd    1.974745
uostr_mean  1.913768
uo_mean     1.739428
vo_mean     1.407480
pred_var    1.108047
#explore partial plots
gbm.plot(do_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ntag)
do_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          3  temp_mean          1 o2_mean_0m   689.53
2         11 bathy_mean          4   sal_mean   592.77
3         11 bathy_mean          9   ssh_mean   576.98
4         11 bathy_mean          5    uo_mean   442.14
5          9   ssh_mean          3  temp_mean   385.11
6         11 bathy_mean          3  temp_mean   378.70
7         11 bathy_mean          1 o2_mean_0m   266.27
8         12   bathy_sd          1 o2_mean_0m   255.59
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3163543
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7429917
max(e@TPR + e@TNR -1) #TSS
[1] 0.8988065
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ntag)
[1] 84.21407
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2078864 0.9108071 0.9855748 0.9981125         0.7717969 0.8421407
do_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[13])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1949867
Correlation        0.9556768
AUC                0.9975000
Per.Expl          85.9346757
cvDeviance         0.4783362
cvCorrelation      0.8485144
cvAUC              0.9643500
cvPer.Expl        65.4953096
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ntag) 

              rel.inf
dist_coast 53.5696625
o2_mean_0m 12.6309567
lat         8.1195782
bathy_mean  5.6664282
chl_mean    3.8500282
temp_mean   2.9692000
sal_mean    2.4553503
ssh_mean    2.1991953
vostr_mean  1.6248422
mld_mean    1.5073425
uo_mean     1.3556893
bathy_sd    1.2144774
uostr_mean  1.0434857
vo_mean     0.9699671
pred_var    0.8237967
#explore partial plots
gbm.plot(do_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ntag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12 bathy_mean          4  temp_mean   969.31
2           4  temp_mean          2 o2_mean_0m   582.13
3          12 bathy_mean          1        lat   295.21
4          12 bathy_mean         10   ssh_mean   220.95
5           3   chl_mean          2 o2_mean_0m   208.59
6           4  temp_mean          1        lat   197.51
7          12 bathy_mean          6    uo_mean   180.50
8          13   bathy_sd          2 o2_mean_0m   168.48
9           3   chl_mean          1        lat   143.75
10         14 dist_coast          5   sal_mean   142.09
11         12 bathy_mean          5   sal_mean   139.44
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2860822
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7444084
max(e@TPR + e@TNR -1) #TSS
[1] 0.9068106
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ntag)
[1] 85.93468
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1961425 0.9209367 0.9877846 0.9968631         0.7936337 0.8593468
do_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2152171
Correlation        0.9494120
AUC                0.9963000
Per.Expl          84.4753535
cvDeviance         0.4980739
cvCorrelation      0.8417192
cvAUC              0.9613600
cvPer.Expl        64.0715342
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ntag) 

               rel.inf
bathy_mean  33.3337448
o2_mean_0m  29.2907706
o2_mean_60m 10.7154473
chl_mean     4.7874063
ssh_mean     4.4654220
temp_mean    4.2047433
sal_mean     2.8985582
vostr_mean   1.9514731
mld_mean     1.8318151
uo_mean      1.6194828
bathy_sd     1.5388993
uostr_mean   1.4127257
vo_mean      0.9997546
pred_var     0.9497570
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ntag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1           3   temp_mean          1 o2_mean_0m   587.36
2          11  bathy_mean          9   ssh_mean   529.13
3          13 o2_mean_60m          3  temp_mean   467.19
4          11  bathy_mean          5    uo_mean   395.13
5          11  bathy_mean          3  temp_mean   317.18
6          11  bathy_mean          4   sal_mean   312.00
7           6  uostr_mean          1 o2_mean_0m   296.24
8          12    bathy_sd          1 o2_mean_0m   223.69
9          13 o2_mean_60m          2   chl_mean   166.11
10         11  bathy_mean          1 o2_mean_0m   151.42
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3054413
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7433705
max(e@TPR + e@TNR -1) #TSS
[1] 0.9012114
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ntag)
[1] 84.47535
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4350 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2039139 0.9142042 0.9862436 0.9962925         0.7796691 0.8447535
do_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[8])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2335030
Correlation        0.9429315
AUC                0.9954000
Per.Expl          83.1563108
cvDeviance         0.5005420
cvCorrelation      0.8398630
cvAUC              0.9608000
cvPer.Expl        63.8935019
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ntag) 

                rel.inf
bathy_mean   30.0267521
o2_mean_0m   29.7310875
o2_mean_250m 17.3780027
temp_mean     3.9785508
chl_mean      3.7684071
sal_mean      2.9143743
ssh_mean      2.5775664
bathy_sd      1.7071494
vostr_mean    1.6822884
uo_mean       1.4379251
uostr_mean    1.4253686
mld_mean      1.3330125
vo_mean       1.1026202
pred_var      0.9368949
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          11   bathy_mean          4   sal_mean   334.54
2           3    temp_mean          1 o2_mean_0m   330.27
3          11   bathy_mean          9   ssh_mean   277.86
4          13 o2_mean_250m          1 o2_mean_0m   276.54
5           6   uostr_mean          1 o2_mean_0m   255.46
6          11   bathy_mean          3  temp_mean   207.76
7           2     chl_mean          1 o2_mean_0m   190.07
8          12     bathy_sd          1 o2_mean_0m   186.35
9          13 o2_mean_250m          4   sal_mean   156.17
10         13 o2_mean_250m          3  temp_mean   151.82
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3233148
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7424683
max(e@TPR + e@TNR -1) #TSS
[1] 0.8898093
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ntag)
[1] 83.15631
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
3850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2116599 0.9071004 0.9846599 0.9967308         0.7667759 0.8315631
do_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[9])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1967857
Correlation        0.9555813
AUC                0.9975000
Per.Expl          85.8049005
cvDeviance         0.4879500
cvCorrelation      0.8446379
cvAUC              0.9630100
cvPer.Expl        64.8018226
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ntag) 

               rel.inf
bathy_mean   29.784726
o2_mean_0m   29.575237
o2_mean_250m 12.535092
o2_mean_60m   6.692749
chl_mean      3.607960
temp_mean     3.236009
sal_mean      2.956732
ssh_mean      2.508817
vostr_mean    1.606424
bathy_sd      1.536286
uo_mean       1.440651
uostr_mean    1.322227
mld_mean      1.317766
vo_mean       1.024736
pred_var      0.854590
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index  var2.names int.size
1           3    temp_mean          1  o2_mean_0m   575.57
2           2     chl_mean          1  o2_mean_0m   367.59
3          11   bathy_mean          4    sal_mean   247.44
4          11   bathy_mean          5     uo_mean   244.71
5          13  o2_mean_60m          3   temp_mean   233.27
6          11   bathy_mean          9    ssh_mean   222.57
7          11   bathy_mean          3   temp_mean   204.95
8          14 o2_mean_250m         13 o2_mean_60m   192.80
9          14 o2_mean_250m          1  o2_mean_0m   184.47
10         13  o2_mean_60m         11  bathy_mean   168.43
11         12     bathy_sd          1  o2_mean_0m   167.06
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2915978
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7440062
max(e@TPR + e@TNR -1) #TSS
[1] 0.9065458
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ntag)
[1] 85.8049
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.1981623 0.9191668 0.9872286  0.999063         0.7896551 0.858049
do_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[10])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1711730
Correlation        0.9637236
AUC                0.9984000
Per.Expl          87.6524677
cvDeviance         0.4688544
cvCorrelation      0.8531779
cvAUC              0.9656700
cvPer.Expl        66.1792818
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ntag) 

                rel.inf
dist_coast   51.2057041
o2_mean_0m   11.5940030
o2_mean_250m  7.6458020
lat           4.9904405
bathy_mean    3.8441712
o2_mean_60m   3.4353426
chl_mean      3.1827866
temp_mean     2.7383315
sal_mean      2.2099326
ssh_mean      1.9054953
mld_mean      1.3048380
vostr_mean    1.1565039
uo_mean       1.1379053
bathy_sd      1.1340787
uostr_mean    0.9609815
vo_mean       0.7913691
pred_var      0.7623142
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          4  temp_mean   460.19
2          12   bathy_mean          6    uo_mean   369.18
3           4    temp_mean          2 o2_mean_0m   294.93
4          15  o2_mean_60m          4  temp_mean   221.21
5           3     chl_mean          2 o2_mean_0m   213.95
6          16 o2_mean_250m          1        lat   194.30
7          12   bathy_mean          1        lat   176.52
8          12   bathy_mean          5   sal_mean   174.94
9          15  o2_mean_60m          5   sal_mean   157.19
10         12   bathy_mean          3   chl_mean   136.43
11          3     chl_mean          1        lat   135.67
12         15  o2_mean_60m         12 bathy_mean   121.03
13          7   uostr_mean          2 o2_mean_0m   114.49
14         12   bathy_mean         10   ssh_mean   109.97
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2611485
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7455318
max(e@TPR + e@TNR -1) #TSS
[1] 0.9219847
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ntag)
[1] 87.65247
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1850589 0.9301022 0.9896335 0.9972148         0.8116198 0.8765247

AGI models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

agi_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2494636
Correlation        0.9389402
AUC                0.9946000
Per.Expl          82.0044167
cvDeviance         0.5227643
cvCorrelation      0.8310463
cvAUC              0.9575400
cvPer.Expl        62.2892911
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 36.491740
temp_mean  21.375886
AGI_0m     10.198606
sal_mean    5.377754
ssh_mean    5.311421
uostr_mean  5.009516
chl_mean    4.871749
vostr_mean  3.562961
bathy_sd    2.025273
mld_mean    1.700623
uo_mean     1.632024
vo_mean     1.472134
pred_var    0.970312
#explore partial plots
gbm.plot(agi_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ntag)
agi_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         12     AGI_0m          2  temp_mean  6302.78
2         10 bathy_mean          2  temp_mean   752.63
3         10 bathy_mean          8   ssh_mean   463.53
4         12     AGI_0m          8   ssh_mean   447.89
5          5 uostr_mean          2  temp_mean   358.75
6         10 bathy_mean          3   sal_mean   358.05
7         10 bathy_mean          4    uo_mean   261.61
8          8   ssh_mean          4    uo_mean   253.56
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3265073
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7424835
max(e@TPR + e@TNR -1) #TSS
[1] 0.8904318
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ntag)
[1] 82.00442
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2116179 0.9077497 0.9853149 0.9994755         0.7644717 0.8200442
agi_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2061652
Correlation        0.9520613
AUC                0.9968000
Per.Expl          85.1278375
cvDeviance         0.4844196
cvCorrelation      0.8457186
cvAUC              0.9631500
cvPer.Expl        65.0553696
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ntag) 

              rel.inf
dist_coast 53.2176953
lat        10.6179286
AGI_0m      7.1940580
bathy_mean  6.9716928
temp_mean   5.2270863
chl_mean    3.4956185
sal_mean    2.7812823
ssh_mean    2.0438267
uo_mean     1.4980073
mld_mean    1.4534671
vostr_mean  1.4445966
uostr_mean  1.1261180
bathy_sd    1.0472647
vo_mean     1.0242153
pred_var    0.8571425
#explore partial plots
gbm.plot(agi_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  2156.60
2          13     AGI_0m          1        lat   801.90
3           3  temp_mean          1        lat   588.20
4          11 bathy_mean          3  temp_mean   459.87
5          14 dist_coast         10   mld_mean   407.88
6          13     AGI_0m         11 bathy_mean   278.43
7          11 bathy_mean          5    uo_mean   268.20
8          11 bathy_mean          2   chl_mean   191.28
9          11 bathy_mean          1        lat   162.84
10         14 dist_coast          4   sal_mean   158.85
11          6 uostr_mean          1        lat   153.97
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2853753
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7446268
max(e@TPR + e@TNR -1) #TSS
[1] 0.907256
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ntag)
[1] 85.12784
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4400 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.195868 0.921452 0.9884097 0.9994483         0.7941426 0.8512784
agi_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2005887
Correlation        0.9556745
AUC                0.9975000
Per.Expl          85.5301104
cvDeviance         0.5068771
cvCorrelation      0.8393242
cvAUC              0.9601500
cvPer.Expl        63.4353465
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ntag) 

              rel.inf
bathy_mean 33.7808593
temp_mean  20.4921335
AGI_0m     10.1395119
uostr_mean  6.1395169
AGI_60m     5.7829887
sal_mean    4.5642329
chl_mean    4.3701423
ssh_mean    3.8515309
vostr_mean  3.6254884
bathy_sd    1.6481303
uo_mean     1.6317208
mld_mean    1.5744189
vo_mean     1.4637452
pred_var    0.9355801
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  8122.33
2          10 bathy_mean          2  temp_mean   616.02
3          10 bathy_mean          8   ssh_mean   594.41
4          12     AGI_0m          8   ssh_mean   501.98
5          13    AGI_60m         10 bathy_mean   480.31
6          10 bathy_mean          3   sal_mean   340.70
7           5 uostr_mean          2  temp_mean   239.76
8           9   mld_mean          6    vo_mean   198.10
9          13    AGI_60m          1   chl_mean   194.10
10         13    AGI_60m          3   sal_mean   172.10
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2889722
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7442668
max(e@TPR + e@TNR -1) #TSS
[1] 0.9016135
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ntag)
[1] 85.53011
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1969633 0.9206625 0.9883319  1.001621         0.7915479 0.8553011
agi_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2101199
Correlation        0.9516769
AUC                0.9969000
Per.Expl          84.8425585
cvDeviance         0.5009849
cvCorrelation      0.8408802
cvAUC              0.9607700
cvPer.Expl        63.8603934
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ntag) 

              rel.inf
bathy_mean 29.6981397
temp_mean  19.9110798
AGI_250m   13.1371225
AGI_0m      8.8212041
uostr_mean  6.9872820
sal_mean    4.6639715
ssh_mean    4.1044072
chl_mean    3.5367397
vostr_mean  2.1325830
bathy_sd    1.7771055
uo_mean     1.4966185
mld_mean    1.4421522
vo_mean     1.3513807
pred_var    0.9402136
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  7413.64
2          12     AGI_0m          8   ssh_mean   379.59
3           8   ssh_mean          4    uo_mean   376.09
4          13   AGI_250m          2  temp_mean   374.64
5          10 bathy_mean          2  temp_mean   352.02
6          13   AGI_250m         10 bathy_mean   349.05
7          10 bathy_mean          3   sal_mean   343.47
8          10 bathy_mean          4    uo_mean   264.93
9           4    uo_mean          2  temp_mean   209.75
10          5 uostr_mean          1   chl_mean   166.77
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2950992
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7440749
max(e@TPR + e@TNR -1) #TSS
[1] 0.9009173
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ntag)
[1] 84.84256
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2000762 0.9178963 0.9878278 0.9987535         0.7871282 0.8484256
agi_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.1855561
Correlation        0.9598511
AUC                0.9980000
Per.Expl          86.6145154
cvDeviance         0.4907170
cvCorrelation      0.8451783
cvAUC              0.9623500
cvPer.Expl        64.6010907
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ntag) 

              rel.inf
bathy_mean 29.9734289
temp_mean  20.3673209
AGI_250m   12.2941229
AGI_0m      8.4707152
uostr_mean  6.0490757
sal_mean    3.9485657
AGI_60m     3.5161350
chl_mean    3.5092053
ssh_mean    3.0434925
vostr_mean  2.0985932
bathy_sd    1.6409157
vo_mean     1.4220106
uo_mean     1.4121966
mld_mean    1.4016687
pred_var    0.8525533
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  5744.72
2          12     AGI_0m          8   ssh_mean   599.57
3          10 bathy_mean          3   sal_mean   441.15
4          10 bathy_mean          2  temp_mean   382.86
5          14   AGI_250m          2  temp_mean   328.55
6          13    AGI_60m         10 bathy_mean   279.25
7          14   AGI_250m         10 bathy_mean   268.58
8          10 bathy_mean          8   ssh_mean   257.93
9          10 bathy_mean          4    uo_mean   232.67
10         12     AGI_0m          3   sal_mean   185.87
11         12     AGI_0m         10 bathy_mean   173.22
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2664466
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7453912
max(e@TPR + e@TNR -1) #TSS
[1] 0.9115428
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ntag)
[1] 86.61452
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1883042 0.927851 0.9903983  1.000388         0.8077969 0.8661452
agi_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.1679933
Correlation        0.9648664
AUC                0.9986000
Per.Expl          87.8814450
cvDeviance         0.4631752
cvCorrelation      0.8543076
cvAUC              0.9664000
cvPer.Expl        66.5878820
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ntag) 

              rel.inf
dist_coast 51.3624256
lat         9.3914556
AGI_0m      6.7567291
bathy_mean  5.5060080
temp_mean   4.8081540
AGI_250m    4.4615978
chl_mean    3.1022312
sal_mean    3.0090800
AGI_60m     2.3105716
ssh_mean    1.7128404
vostr_mean  1.2870285
mld_mean    1.2286325
uo_mean     1.2228660
uostr_mean  1.1546359
vo_mean     0.9892946
bathy_sd    0.9517202
pred_var    0.7447290
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  2706.06
2          13     AGI_0m          1        lat   480.87
3          13     AGI_0m         11 bathy_mean   300.50
4           6 uostr_mean          1        lat   293.87
5          11 bathy_mean          2   chl_mean   273.68
6          11 bathy_mean          3  temp_mean   269.12
7          16   AGI_250m         11 bathy_mean   265.48
8          14 dist_coast         10   mld_mean   219.03
9          11 bathy_mean          9   ssh_mean   209.13
10         13     AGI_0m          9   ssh_mean   176.75
11         11 bathy_mean          4   sal_mean   164.47
12          3  temp_mean          1        lat   147.70
13         11 bathy_mean          5    uo_mean   125.32
14         11 bathy_mean          1        lat   120.91
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2466946
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7463962
max(e@TPR + e@TNR -1) #TSS
[1] 0.9225519
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ntag)
[1] 87.88144
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1793447 0.9348399 0.9915013 0.9995818         0.8220452 0.8788144

Summary table of results

output_sum_Ntag <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv"))
kableExtra::kable(output_sum_Ntag)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 58.620 0.634 0.723 0.755 0.948 0.3096 0.793 0.9480 0.9923 0.586
do_0m_Nspat_Ntag 55.216 0.687 0.714 0.699 0.930 0.3280 0.760 0.9300 1.0000 0.552
do_0m_Yspat_Ntag 59.744 0.631 0.720 0.727 0.942 0.3120 0.784 0.9423 1.0000 0.597
do_0m_60m_Nspat_Ntag 57.007 0.666 0.717 0.712 0.935 0.3220 0.769 0.9350 1.0000 0.570
do_0m_250m_Nspat_Ntag 57.070 0.665 0.717 0.717 0.935 0.3220 0.769 0.9350 1.0000 0.571
do_0m_60m_250m_Nspat_Ntag 57.650 0.659 0.718 0.718 0.937 0.3200 0.772 0.9370 1.0000 0.577
do_0m_60m_250m_Yspat_Ntag 60.130 0.629 0.721 0.729 0.943 0.3120 0.785 0.9420 1.0000 0.601
agi_0m_Nspat_Ntag 54.264 0.695 0.714 0.700 0.929 0.3300 0.757 0.9290 0.9970 0.543
agi_0m_Yspat_Ntag 58.310 0.638 0.720 0.729 0.942 0.3150 0.782 0.9420 0.9990 0.583
agi_0m_60m_Nspat_Ntag 55.293 0.685 0.715 0.703 0.931 0.3270 0.761 0.9310 0.9990 0.553
agi_0m_250m_Nspat_Ntag 56.600 0.663 0.718 0.714 0.937 0.3210 0.771 0.9370 0.9990 0.566
agi_0m_60m_250m_Nspat_Ntag 59.120 0.633 0.721 0.731 0.944 0.3130 0.785 0.9440 0.9990 0.591
agi_0m_60m_250m_Yspat_Ntag 60.510 0.608 0.724 0.746 0.949 0.3060 0.796 0.9490 0.9990 0.605